{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tlvY59v6PjvC"
      },
      "source": [
        "##### Copyright 2025 Google LLC."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "jiFUWa49Pl1F"
      },
      "outputs": [],
      "source": [
        "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XjJkne25Prps"
      },
      "source": [
        "# Synthetic data generation with Gemma 2\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/[Gemma_2]Synthetic_data_generation.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "</table>\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QdM3rXG4U-mb"
      },
      "source": [
        "## Setup\n",
        "\n",
        "### Runtime Environment\n",
        "\n",
        "  1. Click **Open in Colab**.\n",
        "  2. In the menu, go to **Runtime** > **Change runtime type**.\n",
        "  3. Under **Hardware accelerator**, select **T4 GPU**.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UMNcQErqzfML"
      },
      "source": [
        "### Hugging Face Hub Access Token\n",
        "\n",
        "Before diving into the tutorial, let's set up Gemma:\n",
        "\n",
        "1. **Create a Hugging Face Account**: If you don't have one, you can sign up for a free account [here](https://huggingface.com/join).\n",
        "2. **Access the Gemma Model**: Visit the [Gemma model page](https://huggingface.com/collections/google/gemma-2-release-667d6600fd5220e7b967f315) and accept the usage terms.\n",
        "3. **Generate a Hugging Face Token**: Go to your Hugging Face [settings page](https://huggingface.com/settings/tokens) and generate a new access token (preferably with `write` permissions). You'll need this token later in the tutorial.\n",
        "\n",
        "**Once you've completed these steps, you're ready to move on to the next section, where you'll set up environment variables in your Colab environment.**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3pjmx0qYVI5_"
      },
      "source": [
        "### Configure Your Credentials\n",
        "\n",
        "To access private models and datasets, you need to log in to the Hugging Face (HF) ecosystem.\n",
        "\n",
        "If you're using Colab, you can securely store your Hugging Face token (`HF_TOKEN`) using the Colab Secrets Manager:\n",
        "1. Open your Google Colab notebook and click on the 🔑 Secrets tab in the left panel. <img src=\"https://storage.googleapis.com/generativeai-downloads/images/secrets.jpg\" alt=\"The Secrets tab is found on the left panel.\" width=50%>\n",
        "2. **Add Hugging Face Token**:\n",
        "- Create a new secret with the **name** `HF_TOKEN`.\n",
        "- Copy and paste your token key into the **Value** input box for `HF_TOKEN`.\n",
        "- **Toggle** the button on the left to allow notebook access to the secret"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "V7GXw4tjVS0H"
      },
      "source": [
        "This code retrieves your secrets and sets them as environment variables for use later in the tutorial."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "j6tFvOkyZz_o"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import sys\n",
        "\n",
        "if \"google.colab\" in sys.modules:\n",
        "    from google.colab import userdata\n",
        "    os.environ['HF_TOKEN'] = userdata.get(\"HF_TOKEN\")\n",
        "\n",
        "if \"HF_TOKEN\" not in os.environ:\n",
        "    raise EnvironmentError(\n",
        "        \"The Hugging Face token (HF_TOKEN) could not be found in the \"\n",
        "        \"environment variables. This token is required to download the Gemma \"\n",
        "        \"models from the Hugging Face Hub. For more information about \"\n",
        "        \"HF User Access tokens, please refer to the HF documentation \"\n",
        "        \"here: https://huggingface.co/docs/hub/en/security-tokens.\"\n",
        "    )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IYZ1Y84v0Vz_"
      },
      "source": [
        "## Synthetic data generation with Gemma 2"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4Gaww41cItJy"
      },
      "source": [
        "### What's distilabel?\n",
        "\n",
        "Distilabel is a framework designed for generating synthetic data and AI feedback, tailored for engineers working on scalable, reliable pipelines based on verified research. It supports a variety of use cases, including traditional NLP tasks like classification and extraction, as well as generative scenarios like instruction following and dialogue generation. With its programmatic approach, Distilabel accelerates AI development by enabling the rapid creation of high-quality, diverse datasets.\n",
        "\n",
        "Check out [distilabel's home page](https://distilabel.argilla.io/latest/) to learn more!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_zS6gPp1VgPd"
      },
      "source": [
        "### Install dependencies\n",
        "\n",
        "We only need distilabel, which provides all the required modules.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "bVaISQ8GZz_s"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\u001b[?25l   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/442.2 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K   \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[91m╸\u001b[0m \u001b[32m440.3/442.2 kB\u001b[0m \u001b[31m27.1 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m442.2/442.2 kB\u001b[0m \u001b[31m10.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h\u001b[?25l   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/480.6 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m480.6/480.6 kB\u001b[0m \u001b[31m26.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h\u001b[?25l   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/134.8 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m7.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m50.1/50.1 kB\u001b[0m \u001b[31m2.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m116.3/116.3 kB\u001b[0m \u001b[31m6.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m179.3/179.3 kB\u001b[0m \u001b[31m8.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m7.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible.\u001b[0m\u001b[31m\n",
            "\u001b[0m"
          ]
        }
      ],
      "source": [
        "!pip install -q distilabel==1.4.2"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KVyMsQc7WoIu"
      },
      "source": [
        "### Initializing Gemma 2 model\n",
        "\n",
        "In this example, we don't need to initialize Gemma ourselves; this is handled by `distilabel` within the pipeline. The good news is that distilabel is fully compatible with Hugging Face Model Hub, so it will retrieve the model directly! There's no need to set up the model or tokenizer manually."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5V7GTf5uG06Y"
      },
      "outputs": [],
      "source": [
        "model_name = \"google/gemma-2-2b-it\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "l2ePrSbh-lzj"
      },
      "source": [
        "### Basic example\n",
        "\n",
        "In this example, you'll see how to use Gemma on a very small dataset used by `distilabel` for testing. It contains only 10 rows of data, as shown below. The dataset can be explored [here](https://huggingface.co/datasets/distilabel-internal-testing/instruction-dataset-mini-with-generations).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "FYXHT2ZTHQLM"
      },
      "outputs": [],
      "source": [
        "import json\n",
        "import pandas as pd\n",
        "from distilabel.llms import TransformersLLM\n",
        "from distilabel.pipeline import Pipeline\n",
        "from distilabel.steps import LoadDataFromHub, LoadDataFromDicts, LoadDataFromDicts\n",
        "from distilabel.steps.tasks import TextGeneration"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pY4uAWg3IhbE"
      },
      "source": [
        "The following defines the execution flow within the pipeline. In our case, it's quite simple:\n",
        "\n",
        "1. First, we need to load the dataset from the Hugging Face repository.\n",
        "1. Then, we define the task and specify which LLM will be used to generate new data.\n",
        "1. Finally, you need to define how the steps are connected together.\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Hg6KOzN2nrQb"
      },
      "outputs": [],
      "source": [
        "with Pipeline() as simple_pipeline:\n",
        "    load_data = LoadDataFromHub(\n",
        "        output_mappings={\"prompt\": \"instruction\"}\n",
        "    )\n",
        "    text_generation = TextGeneration(\n",
        "        llm=TransformersLLM(model=model_name)\n",
        "    )\n",
        "    load_data >> text_generation\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-H6wuvvv2vph"
      },
      "source": [
        "Let's run our simple pipeline:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "nVHzcrVg2u8T"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "7b67bc0c88c942b08df2ab640fec2aae",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "README.md:   0%|          | 0.00/656 [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[01/10/25 01:43:02] </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.pipeline'</span><span style=\"font-weight: bold\">]</span> 📝 Pipeline data will be written to                <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">base.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#866\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">866</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">'/root/.cache/distilabel/pipelines/pipeline_load_data_from_hub_0_text_gene</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">ration_0/48987b8ac6f1995d3deb7710832316099b19d037/executions/bb445a8c4068e</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">f912dc5802ce333b1e0b6df7bd1/data/steps_outputs'</span>                            <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m[01/10/25 01:43:02]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.pipeline'\u001b[0m\u001b[1m]\u001b[0m 📝 Pipeline data will be written to                \u001b]8;id=184604;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\u001b\\\u001b[2mbase.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=310108;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#866\u001b\\\u001b[2m866\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         \u001b[32m'/root/.cache/distilabel/pipelines/pipeline_load_data_from_hub_0_text_gene\u001b[0m \u001b[2m           \u001b[0m\n",
              "\u001b[2;36m                    \u001b[0m         \u001b[32mration_0/48987b8ac6f1995d3deb7710832316099b19d037/executions/bb445a8c4068e\u001b[0m \u001b[2m           \u001b[0m\n",
              "\u001b[2;36m                    \u001b[0m         \u001b[32mf912dc5802ce333b1e0b6df7bd1/data/steps_outputs'\u001b[0m                            \u001b[2m           \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.pipeline'</span><span style=\"font-weight: bold\">]</span> ⌛ The steps of the pipeline will be loaded in     <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">base.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#889\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">889</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         stages:                                                                    <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>          * Stage <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span>:                                                                <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            - <span style=\"color: #008000; text-decoration-color: #008000\">'load_data_from_hub_0'</span>                                                <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            - <span style=\"color: #008000; text-decoration-color: #008000\">'text_generation_0'</span>                                                   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.pipeline'\u001b[0m\u001b[1m]\u001b[0m ⌛ The steps of the pipeline will be loaded in     \u001b]8;id=845053;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\u001b\\\u001b[2mbase.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=843838;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#889\u001b\\\u001b[2m889\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         stages:                                                                    \u001b[2m           \u001b[0m\n",
              "\u001b[2;36m                    \u001b[0m          * Stage \u001b[1;36m0\u001b[0m:                                                                \u001b[2m           \u001b[0m\n",
              "\u001b[2;36m                    \u001b[0m            - \u001b[32m'load_data_from_hub_0'\u001b[0m                                                \u001b[2m           \u001b[0m\n",
              "\u001b[2;36m                    \u001b[0m            - \u001b[32m'text_generation_0'\u001b[0m                                                   \u001b[2m           \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.pipeline'</span><span style=\"font-weight: bold\">]</span> ⏳ Waiting for all the steps of stage <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> to        <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">base.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1183\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1183</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         load<span style=\"color: #808000; text-decoration-color: #808000\">...</span>                                                                   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">            </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.pipeline'\u001b[0m\u001b[1m]\u001b[0m ⏳ Waiting for all the steps of stage \u001b[1;36m0\u001b[0m to        \u001b]8;id=893417;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\u001b\\\u001b[2mbase.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=590994;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1183\u001b\\\u001b[2m1183\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         load\u001b[33m...\u001b[0m                                                                   \u001b[2m            \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[01/10/25 01:43:04] </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.pipeline'</span><span style=\"font-weight: bold\">]</span> ⏳ Steps from stage <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> loaded: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>/<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span>                 <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">base.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1216\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1216</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>          * <span style=\"color: #008000; text-decoration-color: #008000\">'load_data_from_hub_0'</span> replicas: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>/<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>                                   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">            </span>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>          * <span style=\"color: #008000; text-decoration-color: #008000\">'text_generation_0'</span> replicas: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span>/<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>                                      <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">            </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m[01/10/25 01:43:04]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.pipeline'\u001b[0m\u001b[1m]\u001b[0m ⏳ Steps from stage \u001b[1;36m0\u001b[0m loaded: \u001b[1;36m1\u001b[0m/\u001b[1;36m2\u001b[0m                 \u001b]8;id=179412;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\u001b\\\u001b[2mbase.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=211898;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1216\u001b\\\u001b[2m1216\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m          * \u001b[32m'load_data_from_hub_0'\u001b[0m replicas: \u001b[1;36m1\u001b[0m/\u001b[1;36m1\u001b[0m                                   \u001b[2m            \u001b[0m\n",
              "\u001b[2;36m                    \u001b[0m          * \u001b[32m'text_generation_0'\u001b[0m replicas: \u001b[1;36m0\u001b[0m/\u001b[1;36m1\u001b[0m                                      \u001b[2m            \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.\n"
          ]
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "7beae07ee4b54b54992e33ebcbf02077",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "0it [00:00, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "93b48091ad1b4cef8bc154a27ac6a414",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "config.json:   0%|          | 0.00/838 [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "b65d2ac6429d4bf9adb42c8fa8f35f05",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "4c1dcd704c234693829cad68251ac81d",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "b6780c9b1c5e4ac9937f22f58ed81731",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "682c5ce8c3644f758ba48d4a58c60360",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "model-00002-of-00002.safetensors:   0%|          | 0.00/241M [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "590efff44327479990afa8469e036c2c",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "6bea88882b5d4da6bb556018ed2a6f32",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "c89e75f47e854015ae6383f620b3197f",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "tokenizer_config.json:   0%|          | 0.00/47.0k [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "fe2b6954b03b47d886387937d763611a",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "bfef84ef6c44455bb9c4e29504f75032",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "75f9a138befa4d2893232f24fae26a1e",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "Device set to use cuda:0\n"
          ]
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[01/10/25 01:45:42] </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.pipeline'</span><span style=\"font-weight: bold\">]</span> ⏳ Steps from stage <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> loaded: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span>/<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span>                 <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">base.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1216\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1216</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>          * <span style=\"color: #008000; text-decoration-color: #008000\">'load_data_from_hub_0'</span> replicas: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>/<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>                                   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">            </span>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>          * <span style=\"color: #008000; text-decoration-color: #008000\">'text_generation_0'</span> replicas: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>/<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>                                      <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">            </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m[01/10/25 01:45:42]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.pipeline'\u001b[0m\u001b[1m]\u001b[0m ⏳ Steps from stage \u001b[1;36m0\u001b[0m loaded: \u001b[1;36m2\u001b[0m/\u001b[1;36m2\u001b[0m                 \u001b]8;id=589691;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\u001b\\\u001b[2mbase.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=697945;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1216\u001b\\\u001b[2m1216\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m          * \u001b[32m'load_data_from_hub_0'\u001b[0m replicas: \u001b[1;36m1\u001b[0m/\u001b[1;36m1\u001b[0m                                   \u001b[2m            \u001b[0m\n",
              "\u001b[2;36m                    \u001b[0m          * \u001b[32m'text_generation_0'\u001b[0m replicas: \u001b[1;36m1\u001b[0m/\u001b[1;36m1\u001b[0m                                      \u001b[2m            \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.pipeline'</span><span style=\"font-weight: bold\">]</span> ✅ All the steps from stage <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> have been loaded!   <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">base.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1220\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1220</span></a>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.pipeline'\u001b[0m\u001b[1m]\u001b[0m ✅ All the steps from stage \u001b[1;36m0\u001b[0m have been loaded!   \u001b]8;id=944661;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\u001b\\\u001b[2mbase.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=734222;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1220\u001b\\\u001b[2m1220\u001b[0m\u001b]8;;\u001b\\\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.step.load_data_from_hub_0'</span><span style=\"font-weight: bold\">]</span> 🧬 Starting yielding      <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">step_wrapper.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#179\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">179</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         batches from generator step <span style=\"color: #008000; text-decoration-color: #008000\">'load_data_from_hub_0'</span>. Offset: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span>      <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">                   </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.step.load_data_from_hub_0'\u001b[0m\u001b[1m]\u001b[0m 🧬 Starting yielding      \u001b]8;id=848930;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\u001b\\\u001b[2mstep_wrapper.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=178064;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#179\u001b\\\u001b[2m179\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         batches from generator step \u001b[32m'load_data_from_hub_0'\u001b[0m. Offset: \u001b[1;36m0\u001b[0m      \u001b[2m                   \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.step.load_data_from_hub_0'</span><span style=\"font-weight: bold\">]</span> 📨 Step                   <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">step_wrapper.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#289\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">289</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">'load_data_from_hub_0'</span> sending batch <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> to output queue             <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">                   </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.step.load_data_from_hub_0'\u001b[0m\u001b[1m]\u001b[0m 📨 Step                   \u001b]8;id=172763;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\u001b\\\u001b[2mstep_wrapper.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=886605;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#289\u001b\\\u001b[2m289\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         \u001b[32m'load_data_from_hub_0'\u001b[0m sending batch \u001b[1;36m0\u001b[0m to output queue             \u001b[2m                   \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.step.load_data_from_hub_0'</span><span style=\"font-weight: bold\">]</span> 🏁 Finished running step  <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">step_wrapper.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#129\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">129</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">'load_data_from_hub_0'</span> <span style=\"font-weight: bold\">(</span>replica ID: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span><span style=\"font-weight: bold\">)</span>                             <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">                   </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.step.load_data_from_hub_0'\u001b[0m\u001b[1m]\u001b[0m 🏁 Finished running step  \u001b]8;id=998705;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\u001b\\\u001b[2mstep_wrapper.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=92709;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#129\u001b\\\u001b[2m129\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         \u001b[32m'load_data_from_hub_0'\u001b[0m \u001b[1m(\u001b[0mreplica ID: \u001b[1;36m0\u001b[0m\u001b[1m)\u001b[0m                             \u001b[2m                   \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.step.text_generation_0'</span><span style=\"font-weight: bold\">]</span> 📦 Processing batch <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> in     <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">step_wrapper.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#229\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">229</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">'text_generation_0'</span> <span style=\"font-weight: bold\">(</span>replica ID: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span><span style=\"font-weight: bold\">)</span>                                <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">                   </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.step.text_generation_0'\u001b[0m\u001b[1m]\u001b[0m 📦 Processing batch \u001b[1;36m0\u001b[0m in     \u001b]8;id=831501;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\u001b\\\u001b[2mstep_wrapper.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=122735;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#229\u001b\\\u001b[2m229\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         \u001b[32m'text_generation_0'\u001b[0m \u001b[1m(\u001b[0mreplica ID: \u001b[1;36m0\u001b[0m\u001b[1m)\u001b[0m                                \u001b[2m                   \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "The 'batch_size' attribute of HybridCache is deprecated and will be removed in v4.49. Use the more precisely named 'self.max_batch_size' attribute instead.\n"
          ]
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[01/10/25 01:47:03] </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.step.text_generation_0'</span><span style=\"font-weight: bold\">]</span> 📨 Step <span style=\"color: #008000; text-decoration-color: #008000\">'text_generation_0'</span>  <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">step_wrapper.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#289\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">289</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         sending batch <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> to output queue                                    <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">                   </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m[01/10/25 01:47:03]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.step.text_generation_0'\u001b[0m\u001b[1m]\u001b[0m 📨 Step \u001b[32m'text_generation_0'\u001b[0m  \u001b]8;id=16491;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\u001b\\\u001b[2mstep_wrapper.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=360375;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#289\u001b\\\u001b[2m289\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         sending batch \u001b[1;36m0\u001b[0m to output queue                                    \u001b[2m                   \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.step.text_generation_0'</span><span style=\"font-weight: bold\">]</span> 🏁 Finished running step     <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">step_wrapper.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#129\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">129</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">'text_generation_0'</span> <span style=\"font-weight: bold\">(</span>replica ID: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span><span style=\"font-weight: bold\">)</span>                                <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">                   </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.step.text_generation_0'\u001b[0m\u001b[1m]\u001b[0m 🏁 Finished running step     \u001b]8;id=272661;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\u001b\\\u001b[2mstep_wrapper.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=895644;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#129\u001b\\\u001b[2m129\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         \u001b[32m'text_generation_0'\u001b[0m \u001b[1m(\u001b[0mreplica ID: \u001b[1;36m0\u001b[0m\u001b[1m)\u001b[0m                                \u001b[2m                   \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "c9cafeb24fae4c039b69ff52238fb14f",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Generating train split: 0 examples [00:00, ? examples/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "simple_pipeline_output = simple_pipeline.run(\n",
        "    parameters={\n",
        "        load_data.name: {\n",
        "            \"repo_id\": \"distilabel-internal-testing/instruction-dataset-mini\",\n",
        "            \"split\": \"test\"\n",
        "        },\n",
        "    }\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4yA-e39f7Zcf"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "summary": "{\n  \"name\": \"pd\",\n  \"rows\": 10,\n  \"fields\": [\n    {\n      \"column\": \"instruction\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 10,\n        \"samples\": [\n          \"Joe Biden is the Nth president of the United States. What is N?\",\n          \"Write a plot summary for a comedic novel involving Elon Musk and sea travel.\",\n          \"Tell me what the following code does\\r\\n\\r\\nimport json\\r\\ncsv_file = open('csv_file.txt', 'r')\\r\\njson_list = []\\r\\nfor line in csv_file.readlines():\\r\\n    club, city, country = line.strip().split(',')\\r\\n    json_dict = {'club': club,\\r\\n                 'city': city,\\r\\n                 'country': country\\r\\n    }\\r\\n    json_list.append(json_dict)\\r\\ncsv_file.close()\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"completion\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 10,\n        \"samples\": [\n          \"46\",\n          \"Elon Musk hires a team of experts to build the ultimate yacht, but when the yacht is completed, he realizes that he has no idea how to sail it. With the help of a quirky crew and a fearless captain, the playboy embarks on a wild and hilarious adventure across the open seas, where the crew have to keep Elon alive despite his inability to do anything himself. All the while, Elon takes credit for their hard work.\",\n          \"In short, the code reads in a CSV file containing 3 columns (club, city, country), and converts it into a list of jsons.\\r\\n\\r\\nIn more detail:\\r\\nThis code imports the JSON package, then it opens a text file called 'csv_file.txt' and assigns this file to the variable 'csv_file'. Next, an empty list called json_list is created. The code then loops through the lines of the content of the text file, strips the lines of all white spaces, and splits each word by a comma. \\r\\nThen a dictionary is created with 3 keys (club, city, and country) and 3 values (club, city, and country) respectively. Then this json_dict now in JSON format is added to the json_list created earlier. \\r\\nThe csv_file text file is then closed.\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"meta\",\n      \"properties\": {\n        \"dtype\": \"object\",\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"generation\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 10,\n        \"samples\": [\n          \"Joe Biden is the **46th** president of the United States. \\n\\nHere's how to figure that out:\\n\\n* The US has had 46 presidents since its founding. \\n*  The first president was George Washington. \\n* You can find a complete list of US Presidents online, or in a history book! \\n\",\n          \"##  Spacefaring Shenanigans: A Musk-tastic Tale\\n\\n**Logline:** When Elon Musk's ambitious plan to colonize Mars goes awry, he finds himself stranded on a rogue cruise ship sailing the high seas with a motley crew of eccentric billionaires, washed-up celebrities, and a talking parrot named Captain Quackers. \\n\\n**Synopsis:**\\n\\nElon Musk, ever the visionary, has finally achieved his dream \\u2013 launching a fleet of self-sustaining Martian colonies. But disaster strikes! His rocket malfunctions mid-flight, leaving him marooned in the middle of the Pacific Ocean aboard a luxurious, but decidedly\",\n          \"This Python code snippet reads data from a CSV file and converts it into a list of dictionaries, which is then stored as a JSON object.\\r\\n\\r\\nHere's a breakdown:\\r\\n\\r\\n**1. Importing the `json` module:**\\n\\n   - `import json`: This line imports the `json` module, which provides functions for working with JSON (JavaScript Object Notation) data. \\n\\n**2. Opening the CSV file:**\\n\\n   - `csv_file = open('csv_file.txt', 'r')`: This opens the file named \\\"csv_file.txt\\\" in read mode ('r').\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"distilabel_metadata\",\n      \"properties\": {\n        \"dtype\": \"object\",\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"model_name\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"google/gemma-2-2b-it\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}",
              "type": "dataframe"
            },
            "text/html": [
              "\n",
              "  <div id=\"df-8295af6e-5be1-4892-b6b5-e4636e6dbd23\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>instruction</th>\n",
              "      <th>completion</th>\n",
              "      <th>meta</th>\n",
              "      <th>generation</th>\n",
              "      <th>distilabel_metadata</th>\n",
              "      <th>model_name</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Arianna has 12 chocolates more than Danny. Dan...</td>\n",
              "      <td>Denote the number of chocolates each person ha...</td>\n",
              "      <td>{'category': 'Question Answering', 'completion...</td>\n",
              "      <td>Here's how to solve this problem step-by-step:...</td>\n",
              "      <td>{'raw_input_text_generation_0': [{'content': '...</td>\n",
              "      <td>google/gemma-2-2b-it</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Write a plot summary for a comedic novel invol...</td>\n",
              "      <td>Elon Musk hires a team of experts to build the...</td>\n",
              "      <td>{'category': 'Generation', 'completion': 'Elon...</td>\n",
              "      <td>##  Spacefaring Shenanigans: A Musk-tastic Tal...</td>\n",
              "      <td>{'raw_input_text_generation_0': [{'content': '...</td>\n",
              "      <td>google/gemma-2-2b-it</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Create a 3 turn conversation between a custome...</td>\n",
              "      <td>Clerk: How are you doing today?\\nCustomer: Gre...</td>\n",
              "      <td>{'category': 'Summarization', 'completion': 'C...</td>\n",
              "      <td>## Grocery Store Conversation\\n\\n**Customer:**...</td>\n",
              "      <td>{'raw_input_text_generation_0': [{'content': '...</td>\n",
              "      <td>google/gemma-2-2b-it</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>Write a poem about the sun and moon.</td>\n",
              "      <td>The sun and the moon, the guards from the sky\\...</td>\n",
              "      <td>{'category': 'Generation', 'completion': 'The ...</td>\n",
              "      <td>The Sun, a fiery orb of gold,\\nA blazing heart...</td>\n",
              "      <td>{'raw_input_text_generation_0': [{'content': '...</td>\n",
              "      <td>google/gemma-2-2b-it</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>Does Searle believe that AI can think? Explain...</td>\n",
              "      <td>No, Searle does not believe that AI can think....</td>\n",
              "      <td>{'category': 'Commonsense/logic', 'completion'...</td>\n",
              "      <td>Based on the provided text, John Searle believ...</td>\n",
              "      <td>{'raw_input_text_generation_0': [{'content': '...</td>\n",
              "      <td>google/gemma-2-2b-it</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>Tell me what the following code does\\r\\n\\r\\nim...</td>\n",
              "      <td>In short, the code reads in a CSV file contain...</td>\n",
              "      <td>{'category': 'Natural language to code', 'comp...</td>\n",
              "      <td>This Python code snippet reads data from a CSV...</td>\n",
              "      <td>{'raw_input_text_generation_0': [{'content': '...</td>\n",
              "      <td>google/gemma-2-2b-it</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>Can you find and correct any logical errors in...</td>\n",
              "      <td>The original code counts the number of lowerca...</td>\n",
              "      <td>{'category': 'Natural language to code', 'comp...</td>\n",
              "      <td>The provided code snippet is actually correct ...</td>\n",
              "      <td>{'raw_input_text_generation_0': [{'content': '...</td>\n",
              "      <td>google/gemma-2-2b-it</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>I need you to write a resignation letter to my...</td>\n",
              "      <td>Hi Albert,\\nPlease accept this letter as forma...</td>\n",
              "      <td>{'category': 'Brainstorm', 'completion': 'Hi A...</td>\n",
              "      <td>[Your Address]\\n[City, State, Zip Code]\\n[Emai...</td>\n",
              "      <td>{'raw_input_text_generation_0': [{'content': '...</td>\n",
              "      <td>google/gemma-2-2b-it</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>Joe Biden is the Nth president of the United S...</td>\n",
              "      <td>46</td>\n",
              "      <td>{'category': 'Commonsense/logic', 'completion'...</td>\n",
              "      <td>Joe Biden is the **46th** president of the Uni...</td>\n",
              "      <td>{'raw_input_text_generation_0': [{'content': '...</td>\n",
              "      <td>google/gemma-2-2b-it</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>Write a four-sentence horror story about sleep...</td>\n",
              "      <td>I woke up at 7 am after having nightmares the ...</td>\n",
              "      <td>{'category': 'Generation', 'completion': 'I wo...</td>\n",
              "      <td>The clock ticked, each second echoing the fran...</td>\n",
              "      <td>{'raw_input_text_generation_0': [{'content': '...</td>\n",
              "      <td>google/gemma-2-2b-it</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-8295af6e-5be1-4892-b6b5-e4636e6dbd23')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-8295af6e-5be1-4892-b6b5-e4636e6dbd23 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-8295af6e-5be1-4892-b6b5-e4636e6dbd23');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-5f3aa959-7d72-4cd1-b7ff-b40b4b386cc2\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-5f3aa959-7d72-4cd1-b7ff-b40b4b386cc2')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-5f3aa959-7d72-4cd1-b7ff-b40b4b386cc2 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "text/plain": [
              "                                         instruction  \\\n",
              "0  Arianna has 12 chocolates more than Danny. Dan...   \n",
              "1  Write a plot summary for a comedic novel invol...   \n",
              "2  Create a 3 turn conversation between a custome...   \n",
              "3               Write a poem about the sun and moon.   \n",
              "4  Does Searle believe that AI can think? Explain...   \n",
              "5  Tell me what the following code does\\r\\n\\r\\nim...   \n",
              "6  Can you find and correct any logical errors in...   \n",
              "7  I need you to write a resignation letter to my...   \n",
              "8  Joe Biden is the Nth president of the United S...   \n",
              "9  Write a four-sentence horror story about sleep...   \n",
              "\n",
              "                                          completion  \\\n",
              "0  Denote the number of chocolates each person ha...   \n",
              "1  Elon Musk hires a team of experts to build the...   \n",
              "2  Clerk: How are you doing today?\\nCustomer: Gre...   \n",
              "3  The sun and the moon, the guards from the sky\\...   \n",
              "4  No, Searle does not believe that AI can think....   \n",
              "5  In short, the code reads in a CSV file contain...   \n",
              "6  The original code counts the number of lowerca...   \n",
              "7  Hi Albert,\\nPlease accept this letter as forma...   \n",
              "8                                                 46   \n",
              "9  I woke up at 7 am after having nightmares the ...   \n",
              "\n",
              "                                                meta  \\\n",
              "0  {'category': 'Question Answering', 'completion...   \n",
              "1  {'category': 'Generation', 'completion': 'Elon...   \n",
              "2  {'category': 'Summarization', 'completion': 'C...   \n",
              "3  {'category': 'Generation', 'completion': 'The ...   \n",
              "4  {'category': 'Commonsense/logic', 'completion'...   \n",
              "5  {'category': 'Natural language to code', 'comp...   \n",
              "6  {'category': 'Natural language to code', 'comp...   \n",
              "7  {'category': 'Brainstorm', 'completion': 'Hi A...   \n",
              "8  {'category': 'Commonsense/logic', 'completion'...   \n",
              "9  {'category': 'Generation', 'completion': 'I wo...   \n",
              "\n",
              "                                          generation  \\\n",
              "0  Here's how to solve this problem step-by-step:...   \n",
              "1  ##  Spacefaring Shenanigans: A Musk-tastic Tal...   \n",
              "2  ## Grocery Store Conversation\\n\\n**Customer:**...   \n",
              "3  The Sun, a fiery orb of gold,\\nA blazing heart...   \n",
              "4  Based on the provided text, John Searle believ...   \n",
              "5  This Python code snippet reads data from a CSV...   \n",
              "6  The provided code snippet is actually correct ...   \n",
              "7  [Your Address]\\n[City, State, Zip Code]\\n[Emai...   \n",
              "8  Joe Biden is the **46th** president of the Uni...   \n",
              "9  The clock ticked, each second echoing the fran...   \n",
              "\n",
              "                                 distilabel_metadata            model_name  \n",
              "0  {'raw_input_text_generation_0': [{'content': '...  google/gemma-2-2b-it  \n",
              "1  {'raw_input_text_generation_0': [{'content': '...  google/gemma-2-2b-it  \n",
              "2  {'raw_input_text_generation_0': [{'content': '...  google/gemma-2-2b-it  \n",
              "3  {'raw_input_text_generation_0': [{'content': '...  google/gemma-2-2b-it  \n",
              "4  {'raw_input_text_generation_0': [{'content': '...  google/gemma-2-2b-it  \n",
              "5  {'raw_input_text_generation_0': [{'content': '...  google/gemma-2-2b-it  \n",
              "6  {'raw_input_text_generation_0': [{'content': '...  google/gemma-2-2b-it  \n",
              "7  {'raw_input_text_generation_0': [{'content': '...  google/gemma-2-2b-it  \n",
              "8  {'raw_input_text_generation_0': [{'content': '...  google/gemma-2-2b-it  \n",
              "9  {'raw_input_text_generation_0': [{'content': '...  google/gemma-2-2b-it  "
            ]
          },
          "execution_count": 10,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# Display the output dataset as Pandas DF\n",
        "pd.DataFrame(simple_pipeline_output[\"default\"][\"train\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vkHw6Rfk24r-"
      },
      "source": [
        "As you can see, a new column, `generation`, has been added to the dataset. This column contains the newly generated data."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tdxwuVKs-nyz"
      },
      "source": [
        "### Create your own dataset and pipeline\n",
        "It’s always better to apply the tool to real-life examples. In the following cells, we will explore how to generate facts about various items. These items will be provided as a list."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "kC7tiSCG_dxc"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "[{'instruction': 'Generate 3 fun facts about dogs'},\n",
              " {'instruction': 'Generate 3 fun facts about cats'},\n",
              " {'instruction': 'Generate 3 fun facts about birds'}]"
            ]
          },
          "execution_count": 11,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from typing import List\n",
        "\n",
        "def prepare_data(subjects: List[str], num_of_fact: int = 3):\n",
        "    general_prompt = f\"Generate {num_of_fact} fun facts about \"\n",
        "    return [{\"instruction\": general_prompt + subject} for subject in subjects]\n",
        "\n",
        "data = prepare_data([\"dogs\", \"cats\", \"birds\"])\n",
        "data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZR9bRvQo3lp4"
      },
      "source": [
        "Here, we will define our pipeline. It won’t be much different from the previous one, but we will use our dictionary data as the input and won’t use any input parameters when calling the `run(...)` method, allowing for more output tokens."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "jhtX5dLr_k3I"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[01/10/25 01:56:39] </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.pipeline'</span><span style=\"font-weight: bold\">]</span> 📝 Pipeline data will be written to                <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">base.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#866\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">866</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">'/root/.cache/distilabel/pipelines/pipeline_load_data_text_generation_0/20</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">f1ed8f53124e9d663e7cd3a90e46c439cbbcbb/executions/4853c9c59c87453543087ce0</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">2acbe727da4f37c7/data/steps_outputs'</span>                                       <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m[01/10/25 01:56:39]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.pipeline'\u001b[0m\u001b[1m]\u001b[0m 📝 Pipeline data will be written to                \u001b]8;id=889391;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\u001b\\\u001b[2mbase.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=84376;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#866\u001b\\\u001b[2m866\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         \u001b[32m'/root/.cache/distilabel/pipelines/pipeline_load_data_text_generation_0/20\u001b[0m \u001b[2m           \u001b[0m\n",
              "\u001b[2;36m                    \u001b[0m         \u001b[32mf1ed8f53124e9d663e7cd3a90e46c439cbbcbb/executions/4853c9c59c87453543087ce0\u001b[0m \u001b[2m           \u001b[0m\n",
              "\u001b[2;36m                    \u001b[0m         \u001b[32m2acbe727da4f37c7/data/steps_outputs'\u001b[0m                                       \u001b[2m           \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.pipeline'</span><span style=\"font-weight: bold\">]</span> ⌛ The steps of the pipeline will be loaded in     <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">base.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#889\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">889</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         stages:                                                                    <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>          * Stage <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span>:                                                                <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            - <span style=\"color: #008000; text-decoration-color: #008000\">'load_data'</span> <span style=\"font-weight: bold\">(</span>results cached, won't be loaded and executed<span style=\"font-weight: bold\">)</span>            <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            - <span style=\"color: #008000; text-decoration-color: #008000\">'text_generation_0'</span>                                                   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">           </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.pipeline'\u001b[0m\u001b[1m]\u001b[0m ⌛ The steps of the pipeline will be loaded in     \u001b]8;id=856875;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\u001b\\\u001b[2mbase.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=440939;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#889\u001b\\\u001b[2m889\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         stages:                                                                    \u001b[2m           \u001b[0m\n",
              "\u001b[2;36m                    \u001b[0m          * Stage \u001b[1;36m0\u001b[0m:                                                                \u001b[2m           \u001b[0m\n",
              "\u001b[2;36m                    \u001b[0m            - \u001b[32m'load_data'\u001b[0m \u001b[1m(\u001b[0mresults cached, won't be loaded and executed\u001b[1m)\u001b[0m            \u001b[2m           \u001b[0m\n",
              "\u001b[2;36m                    \u001b[0m            - \u001b[32m'text_generation_0'\u001b[0m                                                   \u001b[2m           \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.pipeline'</span><span style=\"font-weight: bold\">]</span> ⏳ Waiting for all the steps of stage <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> to        <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">base.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1183\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1183</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         load<span style=\"color: #808000; text-decoration-color: #808000\">...</span>                                                                   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">            </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.pipeline'\u001b[0m\u001b[1m]\u001b[0m ⏳ Waiting for all the steps of stage \u001b[1;36m0\u001b[0m to        \u001b]8;id=618272;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\u001b\\\u001b[2mbase.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=247011;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1183\u001b\\\u001b[2m1183\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         load\u001b[33m...\u001b[0m                                                                   \u001b[2m            \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "Device set to use cuda:0\n"
          ]
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[01/10/25 01:56:54] </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.pipeline'</span><span style=\"font-weight: bold\">]</span> ⏳ Steps from stage <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> loaded: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>/<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>                 <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">base.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1216\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1216</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>          * <span style=\"color: #008000; text-decoration-color: #008000\">'text_generation_0'</span> replicas: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>/<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>                                      <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">            </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m[01/10/25 01:56:54]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.pipeline'\u001b[0m\u001b[1m]\u001b[0m ⏳ Steps from stage \u001b[1;36m0\u001b[0m loaded: \u001b[1;36m1\u001b[0m/\u001b[1;36m1\u001b[0m                 \u001b]8;id=281311;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\u001b\\\u001b[2mbase.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=458959;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1216\u001b\\\u001b[2m1216\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m          * \u001b[32m'text_generation_0'\u001b[0m replicas: \u001b[1;36m1\u001b[0m/\u001b[1;36m1\u001b[0m                                      \u001b[2m            \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.pipeline'</span><span style=\"font-weight: bold\">]</span> ✅ All the steps from stage <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> have been loaded!   <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">base.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1220\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1220</span></a>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.pipeline'\u001b[0m\u001b[1m]\u001b[0m ✅ All the steps from stage \u001b[1;36m0\u001b[0m have been loaded!   \u001b]8;id=565150;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py\u001b\\\u001b[2mbase.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=531836;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/base.py#1220\u001b\\\u001b[2m1220\u001b[0m\u001b]8;;\u001b\\\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.step.text_generation_0'</span><span style=\"font-weight: bold\">]</span> 📦 Processing batch <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> in     <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">step_wrapper.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#229\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">229</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">'text_generation_0'</span> <span style=\"font-weight: bold\">(</span>replica ID: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span><span style=\"font-weight: bold\">)</span>                                <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">                   </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.step.text_generation_0'\u001b[0m\u001b[1m]\u001b[0m 📦 Processing batch \u001b[1;36m0\u001b[0m in     \u001b]8;id=587048;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\u001b\\\u001b[2mstep_wrapper.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=540421;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#229\u001b\\\u001b[2m229\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         \u001b[32m'text_generation_0'\u001b[0m \u001b[1m(\u001b[0mreplica ID: \u001b[1;36m0\u001b[0m\u001b[1m)\u001b[0m                                \u001b[2m                   \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "The 'batch_size' attribute of HybridCache is deprecated and will be removed in v4.49. Use the more precisely named 'self.max_batch_size' attribute instead.\n"
          ]
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[01/10/25 01:57:30] </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.step.text_generation_0'</span><span style=\"font-weight: bold\">]</span> 📨 Step <span style=\"color: #008000; text-decoration-color: #008000\">'text_generation_0'</span>  <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">step_wrapper.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#289\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">289</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         sending batch <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> to output queue                                    <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">                   </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m[01/10/25 01:57:30]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.step.text_generation_0'\u001b[0m\u001b[1m]\u001b[0m 📨 Step \u001b[32m'text_generation_0'\u001b[0m  \u001b]8;id=732010;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\u001b\\\u001b[2mstep_wrapper.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=803842;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#289\u001b\\\u001b[2m289\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         sending batch \u001b[1;36m0\u001b[0m to output queue                                    \u001b[2m                   \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'distilabel.step.text_generation_0'</span><span style=\"font-weight: bold\">]</span> 🏁 Finished running step     <a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">step_wrapper.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#129\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">129</span></a>\n",
              "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">'text_generation_0'</span> <span style=\"font-weight: bold\">(</span>replica ID: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span><span style=\"font-weight: bold\">)</span>                                <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">                   </span>\n",
              "</pre>\n"
            ],
            "text/plain": [
              "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m \u001b[1m[\u001b[0m\u001b[32m'distilabel.step.text_generation_0'\u001b[0m\u001b[1m]\u001b[0m 🏁 Finished running step     \u001b]8;id=326651;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py\u001b\\\u001b[2mstep_wrapper.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=823656;file:///usr/local/lib/python3.10/dist-packages/distilabel/pipeline/step_wrapper.py#129\u001b\\\u001b[2m129\u001b[0m\u001b]8;;\u001b\\\n",
              "\u001b[2;36m                    \u001b[0m         \u001b[32m'text_generation_0'\u001b[0m \u001b[1m(\u001b[0mreplica ID: \u001b[1;36m0\u001b[0m\u001b[1m)\u001b[0m                                \u001b[2m                   \u001b[0m\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "3ecfb359257e43e283fbfd30039555bb",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Generating train split: 0 examples [00:00, ? examples/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "with Pipeline() as custom_pipeline:\n",
        "    load_data = LoadDataFromDicts(name=\"load_data\", data=data),\n",
        "    text_generation = TextGeneration(\n",
        "        llm=TransformersLLM(\n",
        "            model=model_name,\n",
        "            generation_kwargs={\"max_new_tokens\": 256}\n",
        "        )\n",
        "    )\n",
        "    load_data >> text_generation\n",
        "\n",
        "custom_output = custom_pipeline.run()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JGpQAqY53tRg"
      },
      "source": [
        "Let's see the results:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "c2iZxLqYAUQB"
      },
      "outputs": [
        {
          "data": {
            "text/markdown": [
              "**Generate 3 fun facts about dogs**\n",
              "\n",
              "\n",
              "Here are three fun facts about dogs:\n",
              "\n",
              "1. **Dogs can recognize themselves in mirrors!**  This is a pretty impressive feat, and scientists believe it's linked to their social intelligence and ability to understand their own reflection as a separate entity. \n",
              "2. **A dog's sense of smell is 10,000 times stronger than a human's.** That means they can detect scents we can barely even perceive! This incredible sense helps them track prey, find lost people, and navigate the world around them.\n",
              "3. **Dogs have different \"barking\" languages.** Just like humans have dialects, dogs have unique barks that convey different emotions and intentions. Some breeds even have distinct vocalizations for specific situations or commands.\n",
              "\n",
              "\n",
              "Let me know if you want more fun dog facts! 🐶 \n",
              "\n",
              "---\n",
              "**Generate 3 fun facts about cats**\n",
              "\n",
              "\n",
              "Here are three fun facts about cats:\n",
              "\n",
              "1. **Cats have a third eyelid!**  It's called the nictitating membrane and it acts like a built-in shield, protecting their eyes from dust, debris, and even scratches. It also helps keep their eyes moist. Pretty cool, right?\n",
              "2. **Cats can purr at different frequencies.** While we often associate purring with contentment, cats actually purr for various reasons, including healing themselves, communicating with each other, and even regulating their blood pressure. Some purrs are high-pitched and others are deep and rumbling – each one has its own purpose!\n",
              "3. **Cats have amazing night vision.** Thanks to special rod cells in their retinas, cats can see much better in low light than humans. They can even see ultraviolet light, which is invisible to us! This makes them excellent hunters, especially at dusk and dawn.\n",
              "\n",
              "\n",
              "Let me know if you want more cat facts! 😻 \n",
              "\n",
              "---\n",
              "**Generate 3 fun facts about birds**\n",
              "\n",
              "\n",
              "Here are three fun facts about birds:\n",
              "\n",
              "1. **Birds can't taste sweet things!**  While they have taste buds, their sense of smell is much stronger than their sense of taste. This means they rely more on scent to find food and mates. \n",
              "2. **Some birds can change their feathers in a matter of hours.**  This isn't just for show! Birds use their feathers for camouflage, insulation, and even communication. They can molt (shed old feathers) and grow new ones quickly depending on the season or environmental changes.\n",
              "3. **Hummingbirds are the only birds that can fly backwards.** Their unique wing structure allows them to rotate their wings at different angles, giving them incredible maneuverability.\n",
              "\n",
              "\n",
              "Let me know if you want some more bird trivia! 🐦 \n",
              "\n",
              "---"
            ],
            "text/plain": [
              "<IPython.core.display.Markdown object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "from IPython.display import display, Markdown, Latex\n",
        "output = []\n",
        "for item in custom_output[\"default\"][\"train\"]:\n",
        "    prompt, response = item[\"instruction\"], item[\"generation\"]\n",
        "    output.extend([f\"**{prompt}**\\n\\n\", response, \"---\"])\n",
        "display(Markdown(\"\\n\".join(output)))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iinyS4uMDS4T"
      },
      "source": [
        "## What's Next?\n",
        "\n",
        "That's it! If you're wondering how to make your chatbot even better, check out the following resources:\n",
        "\n",
        "- **Explore the Gemma family models:** Visit [Gemma Open Models](https://ai.google.dev/gemma) to learn about the latest updates regarding the Gemma family models, new capabilities, versions, and more.\n",
        "- **Check out Distilabel's Components Gallery:** The [components gallery](https://distilabel.argilla.io/latest/components-gallery/) showcases various modules that can be used within a pipeline to achieve the desired results."
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "name": "[Gemma_2]Synthetic_data_generation.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
